Skip to content

Conversation

@Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Nov 4, 2025

Backport #5104

Which issue(s) this PR fixes:
Fixes #4396

What this PR does / why we need it:
Adds timeout mechanism to establish_connection method to prevent infinite loop when handshake protocol gets stuck. In unstable network environments with proxy components, if connection drops during handshake after TLS establishment, Fluentd gets stuck in infinite loop causing logs to stop being flushed. This fix uses existing hard_timeout configuration to break the loop, disable problematic nodes, and maintain log flow through healthy nodes.

Docs Changes:
None required - uses existing hard_timeout configuration parameter.

Release Note:
Fix infinite loop in out_forward handshake protocol that could cause logs to stop being flushed in unstable network environments.

@ashie
Copy link
Member

ashie commented Nov 21, 2025

Same status with #5138 (comment)

Hmm, we need to investigate the CI failure in https://github.com/fluent/fluentd/actions/runs/19123383854/job/54648369661

1) Error: test: Node with security is thread-safe on multi threads(ForwardOutputTest): TypeError: wrong argument type nil (expected Data)
C:/hostedtoolcache/windows/Ruby/3.2.9/x64/lib/ruby/gems/3.2.0/gems/cool.io-1.9.0/lib/cool.io/loop.rb:88:in `run_once'
C:/hostedtoolcache/windows/Ruby/3.2.9/x64/lib/ruby/gems/3.2.0/gems/cool.io-1.9.0/lib/cool.io/loop.rb:88:in `run'
D:/a/fluentd/fluentd/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
D:/a/fluentd/fluentd/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

So, we discussed that this backport will be released at 1.19.2 or later.

@Watson1978 Watson1978 force-pushed the backport-to-1.16/pr5104 branch 2 times, most recently from 530c20a to 3e89308 Compare December 5, 2025 02:10
@Watson1978
Copy link
Contributor Author

Without this change, it appears to fail sometimes in the v1.16 branch as well....

…op (#5104)

**Which issue(s) this PR fixes**:
Fixes #4396

**What this PR does / why we need it**:
Adds timeout mechanism to `establish_connection` method to prevent
infinite loop when handshake protocol gets stuck. In unstable network
environments with proxy components, if connection drops during handshake
after TLS establishment, Fluentd gets stuck in infinite loop causing
logs to stop being flushed. This fix uses existing `hard_timeout`
configuration to break the loop, disable problematic nodes, and maintain
log flow through healthy nodes.

**Docs Changes**:
None required - uses existing `hard_timeout` configuration parameter.

**Release Note**:
Fix infinite loop in out_forward handshake protocol that could cause
logs to stop being flushed in unstable network environments.

Signed-off-by: Ian Driver <[email protected]>
Co-authored-by: Ian Driver <[email protected]>
Signed-off-by: Shizuo Fujita <[email protected]>
@Watson1978 Watson1978 force-pushed the backport-to-1.16/pr5104 branch from 3e89308 to 5acd606 Compare December 5, 2025 07:47
Copy link
Contributor

@kenhys kenhys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a last milestone of v1.16, it might be reasonable.

@kenhys kenhys merged commit 18fe509 into v1.16 Dec 5, 2025
24 checks passed
@kenhys kenhys deleted the backport-to-1.16/pr5104 branch December 5, 2025 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants